MIME-Version: 1.0
Server: CERN/3.0
Date: Monday, 16-Dec-96 21:45:06 GMT
Content-Type: text/html
Content-Length: 23607
Last-Modified: Tuesday, 19-Nov-96 20:35:33 GMT

<HTML>
<HEAD>
<TITLE>Abstracts of the Horus Papers</TITLE>
</HEAD>
<BODY BACKGROUND=images/per3.jpg TEXT=#00000>

<TABLE WIDTH=800 BORDER=0>
<TR>
<TD WIDTH=140 VALIGN=TOP browspan=1ALIGN=LEFT>
<!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><!WA0><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/ud8.jpg"><BR>
<P>
<BR>
<P>
<!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><!WA1><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/index.html"><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><!WA2><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-home.jpg"></A><BR>
<!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><!WA3><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/Overview.html"><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><!WA4><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-overview.jpg"></A><BR>
<!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><!WA5><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/People.html"><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><!WA6><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-people.jpg"></A><BR>
<!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><!WA7><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/Papers.html"><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><!WA8><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-papers.jpg"></A><BR>
<!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><!WA9><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/Software.html"><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><!WA10><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-software.jpg"></A><BR>
<!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><!WA11><A HREF="http://simon.cs.cornell.edu/Info/Projects/HORUS/Links.html"><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><!WA12><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-links.jpg"></A><BR>
<P>
<!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><!WA13><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/recent2.jpg">
<P>
<!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><!WA14><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/sciam2.jpg">
<P>
<!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><!WA15><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/sigcomm.jpg">
<P>
<!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><!WA16><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/cacm2.jpg">
<P>
<!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><!WA17><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/tina.jpg">
<P>
<BR>
<P>
<!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><!WA18><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/ensicon2.gif">
<P>
<!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><!WA19><A HREF="http://simon.cs.cornell.edu/Info/Projects/Ensemble/index.html"><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><!WA20><IMG BORDER=0 SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/button-ens-home.jpg"></A>
<P>
</TD>
<TD WIDTH=640>
<CENTER>
<!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><!WA21><IMG SRC="http://simon.cs.cornell.edu/Info/Projects/HORUS/images/papers.jpg"><BR>
</CENTER>
<CENTER>
<H2>Abstracts</H2>
<P>
</CENTER>
<HR>
<A NAME="sciam"></A>
<H2>Software for Reliable Networks</H2>
Kenneth P. Birman and Robbert van Renesse<BR>
Scientific American, May, 1996<BR>
<P>
The failure of a single program on a single computer can sometimes crash a 
network of intercommunicating machines, causing havoc for stock exchanges, 
telephone systems, air-traffic control and other operations. Two software
designers explain what can be done to make networks more robust. 
<P>
<HR>
<P>
<A NAME="cacm"></A>
<H2>Horus, a flexible Group Communication System</H2>
Robbert van Renesse, Kenneth P. Birman and Silvano Maffeis<BR>
Communications of the ACM, April 1996.
<P>
The emergence of process-group environments for distributed computing
represents a promising step towards 
robustness for mission-critical distributed applications.
Process groups have a ``natural'' correspondence with data or services 
that have been replicated
for availability, or as part of a coherent cache.
They can been used to support highly available security
domains. And, group mechanisms fit well with an emerging generation of
intelligent network and collaborative work applications.
<P>
<HR>
<P>
<A NAME="hpa"></A>
<H2>Masking the Overhead of Protocol Layering</H2>
Robbert van Renesse<BR>
Proceedings of the 1996 ACM SIGCOMM Conference<BR>
Stanford, September 1996<BR>
<P>
Layering of protocols has been advocated as a way of dealing with the
complexity of computer communication.
It has also been criticized for its performance overhead.
In this paper, we present some insights in the design of protocols, and
how these insights can be used to mask the overhead of layering, in a
way similar to client caching in a file system.
With our techniques, we achieve an order of magnitude improvement in
end-to-end message latency in the Horus communication framework.
Over an ATM network, we are able to send and deliver messages of varying
levels of semantics in about 85 microseconds, using a protocol stack of
four layers that were written in ML, a high-level functional language.
<P>
<!--
<HR>
<P>
<A NAME="rtss7"></A>
<H2>Using Group Communication Technology to Implement a Reliable and
Scalable Distributed IN Coprocessor</H2>
Roy Friedman and Ken Birman<BR>
September 1996<BR>
<P>
In this paper we explore the use of group communication technology,
developed in the Horus project, to implement a reliable and scalable
distributed IN coprocessor.
The proposed implementation can handle up to 20,000 calls per second
with 12 computing nodes, can tolerate a single node failure or recovery,
and can recover from periods of overload.
<P>
-->
<HR>
<A NAME="worldwidefailures"></A>
<P>
<H2>World Wide Failures</H2>
Werner Vogels<BR>
Proceedings of the ACM SIGOPS European Workshop, Connamoran, Ireland,
September 1996 <BR>
<P>
The one issue that unites almost all approaches to distributed computing
is the need to know whether certain components in the system have failed or
are otherwise unavailable. When designing and building systems that
will need to function at a global scale, failure management will need to
be considered a fundamental building block. This paper describes the 
development of a system-independent failure management servcies, which allows
systems and applications to incorporate accurate detection of failed
processes, nodes and networks without the need for making compromises in their
particular design.
<P>
<HR>
<A NAME="framework"></A>
<P>
<H2>A Framework for Protocol Composition in Horus</H2>
<P>
Robbert van Renesse, Kenneth P. Birman, Roy Friedman, <BR>
Mark Hayden, and David A. Karr<BR>
August 1995
<P>
The Horus system supports a communication architecture that treats
protocols as instances of an abstract data type.
This approach encourages developers to
partition complex protocols into simple microprotocols, each of which
is implemented by a protocol layer.
Protocol layers can be stacked on top of each
other in a variety of ways, at run-time.
First, we describe the classes of protocols that can be supported this way.
Next, we present the Horus object model that we designed for
this technology, and the interface between the layers that makes it all
work.  We then present an example layer that implements a group
membership protocol.  Next, we show how, given a set of required properties,
an appropriate stack can be constructed.  We look at an example stack of
protocols, which provides fault-tolerant, totally ordered communication
between a group of processes.
The work contributes a standard framework for protocol development and
experimentation, provides a high performance implementation of the
virtual synchrony model, and introduces a methodology for increasing the
robustness of the protocol development process.
<P><HR>
<A NAME="tr95-1579">
<H2>Trading Consistency for Availability in Distributed Systems</H2>
Roy Friedman and  Ken Birman<BR>
TR96-1579<BR>
April 8, 1996<BR>
<p>
This paper shows that two important classes of actions, non left commuting 
and strongly non commuting, cannot be executed by concurrent partitions in 
a system that provides serializable services. This result indicates that 
there is an inherent limitation to the ability of systems to provide services 
in a consistent manner during network partitions.
<P><HR>
<A NAME="tr95-1554"></A>
<H2>Deciding in Partitionable Networks</H2>
Roy Friedman,  Idit Keidar,  Dalia Malki,  Ken Birman and  Danny Dolev<BR>
TR95-1554<BR>
November 27, 1995<BR>
<p>
Motivated by Chandra and Toueg's work, we study decision protocols in a 
model that closely approximates ``real'' distributed systems. Our results 
show how the weakest failure detector and associated consensus algorithm 
can be adapted to a network in which omission failures can occur during 
periods when processes suspect one-another as faulty. For protocols in 
which a majority subset of the participants can reach decisions on behalf 
of the system as a whole, we also characterize a series of stages that 
necessarily arise during execution. Jointly, these findings establish a 
direct relationship between an extended version of the three-phase commit 
protocol, which makes progress even when a traditional three-phase commit 
would block, and the consensus protocol of Chandra and Toueg. Although we 
do not explore the linkage here, our results should also be applicable to 
other agreement protocols for systems of this sort, such as leader election 
and dynamic group membership.<P>
<HR>
<P>
<A NAME="tr95-1537"></A>
<P>
<H2>Strong and Weak Virtual Synchrony in Horus</H2>
 Roy Friedman and  Robbert van Renesse<br>
 August 24, 1995
 <p>
  A formal definition of {\em strong virtual synchrony}, capturing the
  semantics of virtual synchrony as implemented in Horus, is
  presented. This definition has the nice property that every message
  is delivered within the view in which it was sent. However, it is
  shown that in order to implement strong virtual synchrony, the
  application program has to block messages during view changes. An
  alternative definition, called {\em weak virtual synchrony}, which
  can be implemented without blocking messages, is then presented.
  This definition still guarantees that messages will be delivered
  within the view in which they were sent, only that it uses a
  slightly weaker notion of what the view in which a message was sent
  is. An implementation of weak virtual synchrony that does not block
  messages during view changes is developed, and it is shown how to
  use a system that provides weak virtual synchrony even when strong
  virtual synchrony is actually needed. To capture additional ordering
  requirements, the definition of {\em ordered virtual synchrony} is
  presented. Finally, it is discussed how to extend the definitions in
  order to cope with the fact that a process can become a member of
  more than one group.
<P><hr>
<A NAME="tr95-1527"></A>
<P>
<H2>Packing Messages as a Tool for Boosting the 
Performance of Total Ordering Protocols </H2>
 Roy Friedman and  Robbert van Renesse<br>
 July 07, 1995
 <p>
  This paper compares the throughput and latency of four protocols
  that provide total ordering. Two of these protocols are measured
  with and without message packing. We used a technique that buffers
  application messages for a short period of time before sending them,
  so more messages are packed together. The main conclusion of this
  comparison is that message packing influences the performance of
  total ordering protocols under high load overwhelmingly more than
  any other optimization that was checked in this paper, both in terms
  of throughput and latency. This improved performance is attributed
  to the fact that packing messages reduces the header overhead for
  messages, the contention on the network, and the load on the
  receiving CPUs.
<P><hr><p>
<A NAME="tr95-1506"></A>
<H2>Using Virtual Synchrony to Develop Efficient Fault Tolerant
Distributed Shared Memories </H2>
 Roy Friedman<br>
 March 31, 1995
 <p>
  This paper shows how to define consistency conditions for
  distributed shared memories in virtually synchronous environments.
  Such definitions allow to develop fault tolerant implementations of
  distributed shared memories, in which during normal execution,
  operations can be performed very efficiently, and only those
  operations which take place during a configuration change must be
  delayed. Three well known consistency conditions, namely,
  linearizability, sequential consistency, and causal memory, are
  redefined for virtually synchronous environments. It is then shown
  how to provide efficient fault tolerant implementations for these
  definitions.
<P><hr><p>
<A NAME="tr95-1505"></A>
<H2>Protocol Composition in Horus</H2>
 Robbert Van Renesse and  Kenneth P. Birman<br>
March 29, 1995
<p>
 Horus is a communication architecture that treats a protocol as an
 abstract data type. Protocol layers can be stacked on top of each
 other in a variety of ways, at run-time. This paper starts out with
 describing the many classes of protocols that can be supported this
 way. Next, we describe the Horus object model that we designed for
 this technology, and the interface between the layers that makes it
 all work. We then present an example layer which implements a group
 membership protocol. Then, we look at a example stack of protocols,
 which provides fault-tolerant, totally ordered communication between a
 group of processes. We conclude with presenting some remaining
 challenges in our project.<P>
<hr><p>
<A NAME="tr95-1500"></A>
<H2>Horus: A Flexible Group Communications System</H2>
Robbert Van Renesse,  Kenneth P. Birman,  Bradford B. Glade,  Katie
Guo,  Mark Hayden,  Takako Hickey,  Dalia Malki,  Alex Vaysburd and
Werner Vogels<br>
March 23, 1995
<p>
The Horus system offers flexible group communication support for
distributed applications. It is extensively layered and highly
reconfigurable, allowing applications to only pay for services they
use, and for groups with different communication needs to coexist in a
single system. The approach encourages experimentation with new
communication properties and incremental extension of the system, and
enables us to support a variety of application-oriented interfaces.<P>
<hr><P>
<A NAME="tr95-1493"></A>
<H2>Achieving Critical Reliability With Unreliable Components and
Unreliable Glue</H2>
 Mark Hayden and  Kenneth P. Birman<br>
 March 14, 1995
 <p>
  Even the most aggressive quality assurance procedures yield at best
  probabilistic confidence in the reliability of complex systems.
  Distributed systems, because of their large numbers of components,
  are enormously complex engineering artifacts, and hence may appear
  to be inherently unreliable -- despite the best efforts of
  researchers and developers. A cellular distributed systems
  architecture offers the hope of drastically improving the
  reliability of current technologies in settings where reliability is
  critical. The approach combines a stateful style of distributed
  computing within cells with a loosely coupled probabilistic
  inter-cell computing model based on a probabilistic broadcast
  primitive. We give an implementation of this primitive, called
  pbcast, and demonstrate how to use it to implement this methodology.
  Our approach is compatible with the use of popular distributed
  computing and reliability technologies, while offering considerable
  isolation against the spread of failures among cells.
<P><hr><p>
<A NAME="tr95-1490"></A><H2>Preserving Privacy in a Network of Mobile Computers</H2>
 David A. Cooper and  Kenneth P. Birman<br>
 March 03, 1995
 <p>
  Even as wireless networks create the potential for access to information 
  from mobile platforms, they pose a problem for privacy. In order to 
  retrieve messages, users must periodically poll the network. The 
  information that the user must give to the network could potentially be 
  used to track that user. However, the movements of the user can also be 
  used to hide the user's location if the protocols for sending and 
  retrieving messages are carefully designed. We have developed a 
  replicated memory service which allows users to read from memory without 
  revealing which memory locations they are reading. Unlike previous 
  protocols, our protocol is efficient in its use of computation and 
  bandwidth. In this paper, we will show how this protocol can be used 
  in conjunction with existing privacy preserving protocols to allow a 
  user of a mobile computer to maintain privacy despite active attacks.
<P><hr><p>
<A NAME="tr95-1489"></A>
<H2>Incorporating System Resource Information into Flow Control</H2>
 Takako M. Hickey and  Robbert Van Renesse<br>
February 27, 1995
<p>
 Upcall-based distributed systems have become widespread in recent
 years. While upcall-based systems provide some obvious advantages,
 experiences with these systems have exposed unanticipated problems of
 unpredictability and inefficiency. Incorporating system resources
 information into flow control is essential in solving these problems.
 Variants of window-based flow control suitable for distributed systems
 are investigated. Next, message packing, which improves network
 bandwidth usage efficiency, and, consequently, message throughput, is
 presented. Finally, a back pressure mechanism which controls admission
 of messages into the system by blocking applications at high load is
 presented. The combination of the window mechanism and the back
 pressure mechanism provides end-to-end management of system resources.
 The former manages network resources, while the latter manages
 operating system resources. The combination maintains good throughput
 even under high load.<P>
<hr><p>
<A NAME="95-1442"></A><H2>Design and Performance of Horus:
A Lightweight Group Communications System</H2>
<P>Robbert van Renesse, Takako M. Hickey, and Kenneth P. Birman<BR>
december 1994
<P>
The Horus project seeks to develop a communication system addressing
the requirements of a wide variety of distributed applications.
Horus implements the <em>group communications</em> model providing
(among others)
unreliable or reliable FIFO, causal, or total group multicasts.
It is extensively layered and highly reconfigurable allowing
applications to only pay for services they use.
This architecture enables groups with different
communication needs to coexist in a single system.
The approach permits experimentation with new communication
properties and incremental extension of the system, and enables
us to support a variety of application-oriented interfaces.
Our initial experiments show good performance.
<P><HR><P>
<A NAME="nossdav"></A><H2>
Support for Complex Multi-Media Applications using the Horus system.
</H2>
Werner Vogels and Robbert van Renesse 
<BR>
December 1994.
</P>

A distributed multi-media application involves more than just protocols
for the dissemination of video and audio data. As in any other
distributed application, protocols are necessary that guarantee the
 consistency, fault-tolerance, and security of shared data objects. The
Horus system offers a framework for buildin g complex distributed
systems that involve any number of protocols, as well as a variety of
protocols for the diffe rent aspects of a distributed application
(including some protocols specific to multi-media applications). We
believe that
 this integrated approach is superior to combining different toolkits,
and illustrate this with a detailed example of an existing
video-on-demand application.
<P>
<HR>
<P>
<A NAME="tr93-1354"></A>
<H2>A Security Architecture for Fault-Tolerant Systems</H2>
 Michael K. Reiter,  Kenneth P. Birman and  Robbert Van Renesse<br>
June 1993
<p>
 Process groups are a common abstraction for fault-tolerant computing
 in distributed systems. We present a security architecture that
 extends the process group into a security abstraction. Integral parts
 of this architecture are services that securely and fault-tolerantly
 support cryptographic key distribution using novel techniques. We
 detail the design and implementation of these services and the secure
 process group abstraction they support. We also give performance
 figures for some common group operations.
<P>
<hr>
<p>
<A NAME="tr95-1490">
</A><H2>Preserving Privacy in a Network of Mobile Computers</H2>
 David A. Cooper and  Kenneth P. Birman<br>
 October 26, 1994
 <p>
  Even as wireless networks create the potential for access to
  information from mobile platforms, they pose a problem for privacy.
  In order to retrieve messages, users must periodically poll the
  network. The information that the user must give to the network
  could potentially be used to track that user. However, the movements
  of the user can also be used to hide the user's location if the
  protocols for sending and retrieving messages are carefully
  designed. In this paper we will describe a set of protocols that we
  have developed to allow a user with a mobile computer to communicate
  without compromising privacy.

<A NAME="tr94-1447"></A>
<H2>Uniform Actions in Asynchronous Distributed Systems</H2>
 Dalia Malki,  Kenneth P. Birman,  Aleta M. Ricciardi and  Andre
 Schiper<br>
 September 08, 1994
 <p>
  We develop necessary conditions for the development of asynchronous
  distributed software that will perform {\em uniform} actions (events
  that if performed by any process, must be performed at all
  processes). The paper focuses on {\em dynamic uniformity}, which
  differs from the classical problems in that processes continually
  leave and join the ongoing computation. Here, we first treat a
  static version of the problem (lacking joins), and then extend the
  results so obtained to also include joins. Our results demonstrate
  that in contrast to Consensus, which cannot be solved in
  asynchronous systems with even a single faulty process, dynamic
  uniformity can be solved using a failure detection mechanism that
  makes bounded numbers of mistakes. Because dynamic uniformity arises
  in systems that maintain safety within a ``primary partition'' of a
  network, our paper provides a rigorous characterization of the
  framework upon which several existing distributed programming
  environments are based.<P>
  <hr><p>
<P><hr><p>
<A NAME="tr93-1355"></A>
<H2>Understanding Partitions and the ``No Partition'' Assumption</H2>
 Aleta M. Ricciardi,  Andre Schiper and  Kenneth P. Birman<br>
 June 1993
 <p>
  The paper discusses partitions in asynchronous message-passing
  systems. In such systems slow processes and slow links can lead to
  virtual partitions that are indistinguishable from real ones. This
  raises the following question: what is a ``partition'' in an
  asynchronous system? To overcome the impossibility of detecting
  crashed processes in an asynchronous system, our system model
  incorporates a failure suspector to detect (possibly erroneously)
  process failures. Based on failure suspicions we give a definition
  of partitions that acccounts for real partitions as well as virtual
  ones. We show that under certain assumptions about the process
  behavior, any incorrect failure suspicion inevitably partitions the
  system. We then show how to interpret the ``absence of partition''
  assumption.
<P><hr><p>
<A NAME="tr93-1339"></A>
<H2>Virtually-Synchronous Communication Based on a Weak Failure
Suspector</<A NAME=""></A></H2>
 Andre Schiper and  Aleta M. Ricciardi<br>
 April 1993
 <p>
  Failure detectors (or, more accurately, Failure Suspectors - FS)
  appear to be a fundamental service upon which to build
  fault-tolerant, distributed applications. This paper shows that a FS
  with very weak semantics (i.e. that delivers failure and recovery
  information in no specific order) suffices to implement
  virtually-synchronous communication (VSC) in an asynchronous system
  subject to process crash failures and network partitions. The VSC
  paradigm is particularly useful in asynchronous systems and greatly
  simplifies building fault-tolerant applications that mask failures
  by replicating processes. We suggest a three-component architecture
  to implement virtually-synchronous communication : 1) at the lowest
  level, the FS component; on top of it, 2a) a component that defines
  new views, and 2b) a component that reliably multicasts messages
  within a view. The issues covered in this paper also lead to a
  better understanding of the various membership service semantics
  proposed in recent literature.
<P> <hr><p>
<A NAME="tr93-1328"></A>
<H2>Process Membership in Asynchronous Environments</H2>
 Aleta M. Ricciardi and  Kenneth P. Birman<br>
 February 1993
 <p>
  The development of reliable distributed software is simplified by
  the ability to assume a fail-stop failure model. We discuss the
  emulation of such a model in an asynchronous distributed
  environment. The solution we propose, called Strong-GMP, can be
  supported through a highly efficient protocol, and has been
  implemented as part of a distributed systems software project at
  Cornell University. Here, we focus on the precise definition of the
  problem, the protocol, correctness proofs and an analysis of costs.
  Keywords: Asynchronous computation; Fault detection; Process
  membership; Fault tolerance; Process group.
<P><HR>
<EM>
Comments to
<!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><!WA22><A HREF="mailto:vogels@cs.cornell.edu">Werner Vogels </A>
</EM>
</TD>
</TR>
</TABLE>
